#Introduction to the RCSB Protein Data Bank (PDB)
Downloaded the following CSV dile from the PDB site.
db <- read.csv("Data Export Summary.csv", row.names = 1)
head(db)
## X.ray NMR EM Multiple.methods Neutron Other Total
## Protein (only) 142303 11804 5999 177 70 32 160385
## Protein/Oligosaccharide 8414 31 979 5 0 0 9429
## Protein/NA 7491 274 1986 3 0 0 9754
## Nucleic acid (only) 2368 1372 60 8 2 1 3811
## Other 149 31 3 0 0 0 183
## Oligosaccharide (only) 11 6 0 1 0 4 22
Q1: What percentage of structures in the PDB are solved by X-Ray and Electron Microscopy.
round(sum(db$X.ray)/sum(db$Total)*100,2)
## [1] 87.55
round(sum(db$EM)/sum(db$Total)*100,2)
## [1] 4.92
Q2: What proportion of structures in the PDB are protein?
round(db$Total[1]/sum(db$Total)*100,2)
## [1] 87.36
Q3: Type HIV in the PDB website search box on the home page and determine how many HIV-1 protease structures are in the current PDB?
#Visualizing the HIV-1 protease structure
Q4: Water molecules normally have 3 atoms. Why do we see just one atom per water molecule in this structure?
These water molecules only have one atom, which is oxygen, because the 2 hydrogen atoms present are too small to see.
Q5: There is a conserved water molecule in the binding site. Can you identify this water molecule? What residue number does this water molecule have (see note below)?